Mon. Not. R. Astron. Soc. , 000-000 (1997) 



Cluster mass estimation from lens magnification 



Eelco van Kampen 

Royal Observatory Edinburgh, Blackford Hill, Edinburgh EH9 3HJ 

Theoretical Astrophysics Center, Juliane Maries Vej 30, DK-2100 K0benhavn 0, Denmark, eelco@tac.dk 



Accepted ... Received in original form 



ABSTRACT 

The surface mass density of a cluster of galaxies, and thus its total mass, can be 
estimated from its lens magnification. The magnification can be determined from the 
variation in number counts of its background galaxies. In the weak lensing approxima- 
tion the surface mass density is a linear function of the magnification. However, most 
observational data is concentrated in the central parts of clusters, so one needs to go 
beyond the weak lensing approximation, and consider the lens shear as well, which 
is unknown from the variation in number counts alone. Our approach is to look for 
approximate relations between the lens shear and other lens properties in this strong 
lensing regime. 

Such relations exist for simple analytical cluster models, like the isothermal sphere, 
but are not generally a good description of observed or simulated galaxy clusters. We 
therefore study the lensing properties of a catalogue of numerical cluster models in 
order to find the best possible approximation for the shear which still allows straight- 
forward determination of the surface mass density. We show that by using such an 
approximation one can fairly well reconstruct the surface mass distribution from the 
magnification alone. The approximations are tested using clean magnification maps 
obtained directly from simulated clusters, and also using lensed mock background 
galaxy distributions in order to estimate the intrinsic uncertainties of the method. We 
demonstrate that the mass estimated using the weak lens magnification approxima- 
tion is usually at least twice the true mass. We illustrate our technique on existing 
data, and show that the resulting masses compare well to other estimates. 

Key words: cosmology: theory - dark matter - large-scale structure of Universe - 
gravitational lensing 



1 INTRODUCTION 

A rich cluster of galaxies acts as a gravitational lens on the 
galaxy distribution behind it. This simple fact can be used 
to derive a great deal about both the lensing cluster as well 
as the background galaxy population (eg. Schneider, Ehlers 
& Falco 1992; Fort & Mellier 1994; Kaiser 1996). In this 
paper we deal with methods that exploit the variation in 
galaxy number counts caused by the lensing cluster to ob- 
tain properties of that cluster, for example its total mass. 
Broadhurst, Taylor & Peacock (1995, BTP from here on) 
have shown how this variation in number counts of back- 
ground galaxies depends on the lens magnification, and how 
to best obtain the latter from the former. 

In order to obtain a total mass for the lens, or a mass 
distribution, one has to somehow derive the surface mass 
density from the lens magnification. BTP use the weak lens 
approximation to relate the two directly. However, this ap- 
proximation is only valid in the outskirts of clusters, while 



most of the observational data is restricted to the central 
parts of clusters, where lensing is strong. We therefore need 
to go beyond the weak lensing approximation to realisti- 
cally estimate the cluster surface mass density in this strong 
lensing regime. This means that besides the magnification 
one needs the shear distribution in order to obtain the lens 
convergence, and thus the surface mass density of the lens. 

There are various ways of obtaining the shear distribu- 
tion from observations, like the method devised by Kaiser & 
Squires (1993) that utilizes the shearing of the background 
galaxy images to estimate the tangential shear component 
(but not the radial one). This method also provides a way 
to estimate the surface mass density, save a constant. The 
problem of this unknown constant, dubbed the 'sheet-mass 
degeneracy' (Gorenstein, Falco & Shapiro 1988), prevents 
absolute mass measures, although the edges of the observed 
field can be used to put a lower limit on the mass by re- 
quiring it to be positive everywhere. Furthermore, Kaiser 
& Squires (1993) again assume weak lensing. Extensions 
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to these methods in order to break this degeneracy and/or 
consider the strong lensing regime have been devised by, 
amongst others, Schneider & Seitz (1995), Kaiser (1995), 
Schneider (1995) and Bartelmann et al. (1996). The relia- 
bility of these and other shear methods has been discussed 
by Bartelmann (1995) and Wilson, Cole & Frenk (1996). 

However, an approximation that relates the shear field 
to either the magnification or the convergence field allows 
one to obtain an absolute measure for the convergence, and 
thus the surface mass density, from the magnification alone. 
This also provides a mass estimate that is independent from 
other methods. Observationally, measuring the shapes of 
galaxies is more difficult than just counting them. Thus, a 
route to the cluster mass from the lens magnification alone 
is a major advantage: ground-based observations are suffi- 
cient, as imaging of the galaxies is not required. 

We use a sample of numerical galaxy cluster models 
(van Kampen & Katgert 1997) to find heuristic relations 
between lens shear and lens convergence (or lens magnifica- 
tion). We also consider some relations that have an under- 
lying assumption about the physical state of the lens, like 
isotropy. 

In order to find the maximum performance of the esti- 
mators that correspond to such approximations, we test how 
well one can estimate the surface mass density from just the 
magnification map, as obtained directly from a numerical 
cluster model. Subsequently, we test how well the estimators 
work on magnification maps that were obtained from lensed 
mock background galaxy distributions. We roughly follow 
the same procedures as an observer would for an observed 
galaxy distribution, thus mimicking most of the problems 
involved in the application of the method. 

Recently, Fort, Mellier & Dantel-Fort (1997) and Taylor 
et al. (1998) showed that a depletion in number counts can 
clearly be observed. Fort et al. (1997) showed this most 
convincingly for the cluster CL 0024+1654. They did not 
try to estimate the cluster surface mass density or the total 
mass, however. Taylor et al. (1998) did estimate a mass 
for A1689, and found it to be consistent with other mass 
measures. 

The paper is outlined as follows: in Section 2 we sketch 
the path from observed number counts to estimated cluster 
properties. The necessary approximations and methods are 
introduced in Section 3, and tested on simulated data in 
Section 4. We apply the technique to published data in the 
literature in Section 5. 

2 THE LENS MAGNIFICATION METHOD 

2.1 The thin lens approximation 

We summarize the main features of the thin lens approxima- 
tion, following Schneider, Ehlers & Falco (1992) and Bartel- 
mann & Weiss (1994), paying attention to those elements 
that are important for this paper. For properly renormal- 
ized mass and length scales (see Bartelmann & Weiss 1994 
for details) , the lens equation becomes 

y = x-a(x) , (1) 

where x and y represent the lens and source planes respec- 
tively, and a is the deflection angle, being the gradient of 
the lens potential tp, given by 

a(x) = W(x) = (k*K)(x) . (2) 



Here we have introduced the lens convergence k, being the 
dimensionless surface mass density E(£ox) of the lens (where 
£o scales the dimensionless x to a dimensional quantity) : 



k(x) = E(Cox)/S cr , 
and the kernel 



(3) 



(4) 



with which n is convolved to obtain the deflection angle. The 
critital surface mass density S cr plays an important role in 
lensing theory. It is defined as 



Scr = Pcr^" 5 -/(Zd,2s) , 



(5) 



where p CT is the critical density of the background universe, 
and /(z d , z s ) a function of the lens redshift z A and the source 
redshift z s , whose expression depends on the geometry of 
the universe, i.e. the cosmological model (see Schneider et 
al. 1992, ch. 5). From the deflection angle we can calculate 
the lens magnification and shear through the Jacobian A of 
the lens mapping: 



da(x) 
dx 



(6) 



The lens magnification /i is obtained from its determinant: 
// _1 (x) = det A(x) = A n A 2 2 - ^12^21 ■ (7) 
The lens shear consists of the two trace-free components 

7i(x) = i(A 22 -An) , 72 (x) = -A 12 = -A 2 i , (8) 
with the total shear given by 
7 2 =71 + 72 = ^(^22 - An) 2 + A 12 A 2 i 



= ^(An + A 22 f - ft 1 



(9) 



The convergence can also be expressed in terms of A through 
the Poisson equation for the lensing potential: 

k(x) = Iva = 1 - I An - \A 22 . (10) 

We can thus relate the three main lensing properties: 

M - 1 = (l-«) 2 - 7 2 • (11) 



2.2 Cluster mass estimation from variations 
in number counts 

We sketch the whole path, in three distinctive stages, from 
observed number counts to estimated cluster properties, no- 
tably the total mass. Besides describing the method, we 
indicate where we need to make assumptions. 

2.2.1 Variation in number counts to magnification 

The presence of a lens gives rise to a variation in the number 
counts of background galaxies (BTP). For galaxies counted 
up to a magnitude limit mi im , we denote this variation as 
N/Nq, where Nq is the average galaxy number count for 
the field. Assuming that the integral luminosity function 
of these galaxies can for m < mi; m be approximated by 
a power-law with slope S, the magnification fi can be cal- 
culated from N/Nq and a maximum-likelihood analysis of 
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the background redshift distribution (BTP), taking into ac- 
count the clustering of background galaxies which confuses 
the lensing signal (Taylor & Dye 1998). In case no redshift 
information is available for the background galaxies, an es- 
timate for the absolute value of /i is given by 

iMcstl = (N/Nof, with f3 = (2.55 - I)" 1 . (12) 

In most cases (3 is negative, i.e. we observe a depletion in 
number counts due to the presence of the cluster. Note that 
we can only measure the absolute value of fi, so we have to 
set its sign, the image parity, by hand. 

Although the feasibility of actually obtaining /i from 
observational data is an interesting topic for discussion, the 
issue of this paper is how to proceed from the magnification 
to the properties of the lensing cluster. We therefore assume 
for the remainder of this paper that one can reliably measure 
the lens magnification, as it has been shown to be detectable 
(Fort, Mellier & Dantel-Fort 1997; Taylor et al. 1998). 

2.2.2 Magnification to convergence 

As is obvious from eq. (11), in order to obtain k one needs 
to know both the lens magnification and shear. As the lens 
shear can be found from the distortion of the shapes of the 
background galaxies (Kaiser & Squires 1993), one could, in 
principle, combine this with the magnification found from 
the variation in number counts to calculate the convergence. 
However, here we like to use the latter as an independent de- 
termination of the convergence, as one can also derive k, save 
a constant, directly from the lens shear (Kaiser & Squires 
1993). Just detecting galaxies is also easier than obtaining 
their shapes, can be done in larger numbers, and from the 
ground. In order to get k from the lens magnification alone, 
we need to make assumptions about the shear field. We do 
this by finding approximate relations for simulated clusters, 
which have known lensing properties. Once we have found 
a relation between the shear and either the magnification or 
the convergence, we can also go straight from the variation 
in number counts to an estimate for the convergence. 

2.2.3 Convergence to projected cluster mass and other 
properties 

The convergence of a lens depends on the redshift of the 
background galaxy being lensed, so for a distribution of 
background galaxies one finds a weighted average over con- 
vergences for each of these galaxies. In order to translate 
this convergence to a surface mass density we just multiply 
by E cr for an effective source redshift, which depends on the 
redshift distribution adopted. The total projected mass is 
then found by integration over the surface mass density. 

Many other methods for cluster mass estimation give 
3D masses. We can derive these using a relation between 
2D and 3D masses for the model cluster catalogue of van 
Kampen & Katgert (1997), which is given as a fit to the 
scatter plot for all models (van Kampen 1998): 

^ = 0.56 tan" 1 f - - * - ) . (13) 
M 2D ^ 0.14ft- 1 Mpc/ v ; 

The scatter around this relation is fairly large for small R, 
due to substructure along the line-of-sight, but less than 10 
per cent for R > 0.4ft _1 Mpc (van Kampen 1998). 



3 FINDING AN OPTIMAL 
CONVERGENCE ESTIMATOR 

3.1 General strategy 

The convergence k of a lens, from which we can obtain its 
mass, is not just a function of the lens magnification fj, that 
we measure, but also of its lens shear 7 (see eq. 11). One way 
to eliminate the dependence on 7 is to find an approximate 
relation between 7 and ft. This can be done by looking more 
closely at the Jacobian of the lens mapping, A. All three 
main lensing properties are a function of two or more of its 
components. One can therefore try to statistically relate 
these components to each other. For example, is we assume 
that An = A22, and both A12 and A21 vanish, we have 
7 = and /i -1 = (k — l) 2 (discussed in more detail below). 

Another approach is to start from eq. 11, and assume 
an arbitrary local relation 7(/t), i.e. 

„-\k) = (1-k) 2 - 7 2 (k) . (14) 

For a typically aspherical and clumpy cluster, the conver- 
gence has a strong dipole component, while the shear is 
dominated by a quadrupole component. In other words, 
only specific lensing potentials will satisfy such a relation 
exactly. However, such a relation can be a good approxima- 
tion for averaged quantities, like radial profiles, for example. 
Also, when these functions are smoothed significantly, as is 
often the case for observational data, approximate local re- 
lations should exist. 

The assumption of locality of the shear allows many 
possible specific approximations, so the aim is to select ei- 
ther physically motivated j(k), or a j(k) that leads to a rela- 
tion between /1 and k which is easily invertible, and therefore 
applicable to observations. We use a representative sample 
of cluster models to find such a relation, by simply investi- 
gating how shear and convergence relate to each other for 
these models. 

One problem we will always have to deal with is that of 
image parity, the sign of the lens magnification, as we can 
only obtain /ij from the observed number counts, and have 
to make an educated guess about this parity. In the general 
case we have a second parity as well, as the magnification is a 
quadratic function of k and 7. In looking for a local relation 
between k and 7 we should therefore try to minimize the 
range of /1 for which we have to set parities. 

In devising approximations we need to take care that 
the shear 7 remains real for all /i. Furthermore, we set 
H~ (0) = 1, i.e. for k, — ► we also have 7^0, which 
corresponds to saying that the cluster is isolated. 

The various possibilities, along with their physical inter- 
pretation, are discussed below. Both k and 7 are labelled to 
denote the approximation made, while n and N/No appear 
unindexcd, as we consider them to be observed functions 
for the sake of this paper. We first consider approximations 
with just one parity. 

3.2 Estimators with one parity 

Estimators with only one parity which also have physical 
shear distributions are special cases of eq. (11): the expres- 
sion (ft — l) 2 — 7 2 is allowed to have a sign uncertainty, but 
its associated parity should be the same as that of /i in 
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Figure 1. Lens magnification ;U as a function of convergence 
K and shear 7. Several approximations relating re and 7 locally 
are indicated as lines on the surface, where black lines indicate 
the one-parity approximations 7 = (dashed line) and 7 = k 
(solid line), described in Section 3.2, and white lines the two- 
parity approximations 7 oc k (dashed line) and 7 oc re 1 / 2 (dotted 
line), described in Section 3.3. The caustics are indicated by large 
dots. 

order to effectively have one parity only. This means that 
the following simple possibilities remain: 

H'^iK-lf (7 = 0) 

= 1 - 2k (7 = k) (15) 

lT 1 = 7 2 (k = 0) 

If we try pT 1 = (k — q) 2 , then y 2 = q 2 — 1 + 2(1 — q)n, 
which is only positive definite for q = 1, the first possibility 
listed above. The last possibility is not very likely in reality, 
obviously. 

In Fig. 1 we have plotted pT 1 as a function of both 
k and 7. On this surface of possible (k, 7, pi -1 ), we have 
drawn the 7 = k (solid black line) and 7 = (dashed black 
line) approximations. This shows why only these two re- 
main when we require the one parity approximations to go 
through (k,7,/h _1 ) = (0,0,1). 

More complicated functions of ^i _1 (k) can be proposed, 
of course, which start at (0,0,1), and cross the /i" 1 = 
plane only once. But all of these also lead to complicated 
(and probably multi- valued) expressions for 7(k), and, more 
importantly, inversion of ^i _1 (k) becomes less straightfor- 
ward. 

3.2.1 No shear: 7 = 

The first of the two one-parity approximations is also the 
simplest: we forget about shear altogether, which corre- 
sponds to treating the cluster as a uniform sheet of matter. 
This means that we set An — A22, and A12 = A21 = 0. 
Setting 7 = 0, the estimator is easily derived from eq. (11): 

K0 = l-£Vr 1/2 =l-V(N/N )- f3 / 2 , (16) 



where V is the image parity, i.e. the sign of fx. In this approx- 
imation there is one critical line, at k = 1, which separates 
the two parity regimes: V = 1 for n < 1, and V — — 1 for 
k > 1. 

For observational data k is of course a-priori unknown, 
but the position of the critical line can usually be guessed 
from the occurance of giant arcs, or the position of a sig- 
nificant dip in the number counts, most easily in number 
counts in spherical bins, but also in 2D maps. 

3.2.2 Isotropic approximation: 7 = k 

BTP argue that if the fluctuations around the mean lensing 
potential are reasonably isotropic, that : 

((1-A n ) 2 ) « {(I-A22) 2 ) « {(A 12 ) 2 ) (17) 

(note that there is a typographical error in their eq. 19). If 
we make this exact, i.e. 1 — An = 1 — A22 = A12 = A21, 
we find that 71 = 0, 7! = 7 2 = k 2 = (1 — An) 2 etc., and 
fi' 1 = 1 - 2k. 

So, assuming that the shear is equal to the convergence 
results in the estimator 

1 1^, -1, 1 l^f N \-P 

where V is again the image parity. The critical line is now 
assumed to be at k = 1/2, so this estimator automatically 
gives a smaller mass than the shearless mass estimator which 
assumes n — 1 at the critical line. This makes physical sense, 
as there is now a shear contribution to the lensing, while in 
the shearless case k has to account for all the magnification. 

3.2.3 Linear approximation 

All approximations discussed so far can be linearized to the 
simple form 

1 1 / N \@ 

«hn =-(,-!) = -(-) , (19) 

a form used by BTP and others. This approximation is ob- 
viously useful for pi w 1 only, corresponding to small k, i.e. 
the outskirts of clusters. However, most observational data 
is restricted to the core of the cluster only, because of the 
limited field of view (for the Hubble Space Telescope for ex- 
ample), or because the data was taken for other reasons. 
This renders the linear approximation quite useless for our 
purposes. We will show it for reference only in the remain- 
der of this paper. Note that the linear approximation has 
no critical curves and no parity changes, as fi, being pro- 
portional to k, will never become zero; it will only increase 
towards the centre. 

3.3 Heuristic estimators 

A typical cluster will be aspherical and clumpy, which means 
that the simple approximations described above will not 
hold. In fact, an exact local relation between the lens con- 
vergence and shear is not expected to exist when the cluster 
is not spherical, as the convergence will generally have a 
strong dipole moment, while the shear is dominated by a 
quadrupole component. So we need to find a simple func- 
tion which relates k and 7 on average. 
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Figure 2. An example of the lens convergence, shear, and magnification for a simulated galaxy cluster (the fourth entry in Table 1). 
The convergence k was obtained from an N-body simulation using adaptive window smoothing (see text for details). The shear 7 and 
magnification /j, were obtained from k using the thin lens approximation. 



Table 1. Properties of the four cluster models used for some of 
the Figures. The richness measure Cj^qq is defined by Mazurc 
et al. (1995), the parameter c is part of the heuristic estimator 
described in Section 3.3.2, and z s are the redshifts of the lens 
and the background galaxies respectively, whereas the cosmolog- 
ical parameter erg determines how evolved the clusters are. 
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3.3.1 Numerical cluster models 

A fruitful approach is to look at numerical cluster models, 
and find a local relation between k and 7 by looking at 
scatter-plots of these two quantities from the pixels of the 
convergence and shear maps of model clusters. This is useful 
only when we look at a fair sample of clusters models, which 
is representative for the variaty of clusters found on the sky. 
For this purpose we use the catalogue of high-resolution clus- 
ter models of van Kampen & Katgert (1997), which was con- 
structed to mimic an observed sample (Mazure et al. 1995; 
Katgert et al. 1995). 

The individual cluster models were built using a dissipa- 
tionless N-body code which was supplemented with a recipe 
for galaxy formation and merging (van Kampen 1994, 1997), 
which makes it possible to get Abell cluster properties like 
richness. A groups of particles that collapses into a virialized 
group with a mass corresponding to that of a galaxy halo 
is, during the simulation run, replaced by a single, massive 
'galaxy particle'. However, brightest cluster galaxies like 
cD's and gE's are not replaced by single particles. For the 
lensing properties of their parent cluster this is important, 
as both the core and the substructure of the cluster should 
be modelled with sufficient resolution (Bartelmann & Weiss 



1994; Bartelmann, Steinmetz & Weiss 1995). We adopted 
a Plummer softening parameter of 40ft~ 1 kpc (comoving), 
which is adequate for our purposes; see van Kampen (1994) 
for a more comprehensive discussion on resolution issues 
connected to the numerical simulation technique. Note that 
the resolution of the projected density distribution will auto- 
matically be higher. Therefore, more important is the use of 
adaptive window smoothing (see below), which retains that 
resolution as much as possible during the smoothing which 
is necessary for the calculation of the lens properties, since 
we use the thin lens approximation. 

We use 29 cluster models that constitute a complete 
sample for richness Caco > 75 (the entries in boldface in 
Table 1 of van Kampen & Katgert 1997). Please refer to 
Mazure et al. (1995) for the definition of the richness param- 
eter Caco • The 29 clusters were simulated for the standard 
Cold Dark Matter scenario, and have as = 0.63 when put 
at a redshift zrj = 0.4. However, we have also selected four 
specific cluster models, with a range in mass, as, and other 
properties relevant for lensing, for demonstrating the vari- 
ous methods and tests. The least massive of these models 
we may consider to be a 'weak' lens, while the most mas- 
sive one is a 'strong' lens with both caustic lines present for 
most source redshifts. Some of the properties of these four 
models, and the redshift they are put at, are listed in Table 
1. The cluster number relates to the entry in the catalogue 
of cluster models of van Kampen & Katgert (1997), where 
more properties of these models can be found. 

3.3.2 Obtaining the lens properties of simulated galaxy 
clusters 

We obtain the surface mass density (and thus the con- 
vergence) from the numerical models using adaptive win- 
dow smoothing (Silverman 1986), with an initial (Gaussian) 
smoothing length of 0.25/i _1 Mpc. This results in having a 
smoothing length of 0.05/i _1 Mpc in the centre of the clus- 
ter models (which is identical to that used by Bartelmann 
& Weiss (1994) for their cluster models), and 1.0/i _1 Mpc in 
the outskirts. This provides sufficient resolution for the lens 
mapping. For example, giant arcs are formed as expected 
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Figure 3. The left column shows scatter plots of the lens magnification /i versus the lens convergence k, with corresponding plots of the 
lens shear 7 versus k, for the four clusters listed in Tablel. The parameter c for the 7 oc k 1 / 2 estimator (dot^-dashed line) was fitted 
using the re-7 plots, and is annotated in the right hand panels. Solid lines correspond to the 7 = approximation, dashed lines are for 
7 = k, and dotted lines indicate the weak (i.e. linear) lens approximation. 
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Figure 4. Comparison of the shear approximations in terms 
of their caustic lines. Show is the lens convergence for a mas- 
sive clusters (the fourth entry of Table 1), with its true caustic 
lines (black curves) and the caustic lines corresponding to these 
approximations superimposed. White curves are for the assump- 
tion that 7 oc k 1 / 2 , dark grey curves for 7 = 0, and light grey 
ones for 7 = re. 

(van Kampen 1996). The lens shear and magnification are 
calculated using the thin lens approximation, as outlined in 
Section 2.1, with the convolution in eq. (2) done by Fast 
Fourier Transforms. All lens properties are calculated on a 
1024x 1024 grid which measures 4/i _1 Mpc on a side. As an 
example we show these maps for the fourth cluster of Table 
1 in Fig. 2, along with its caustics. 

In Fig. 3 we plot, for the cluster models listed in Table 
1, the absolute magnification \fj,\ versus the convergence k, 
and the lens shear 7 versus k, as scatter plots. For clarity 
we plot just one out of every 250 pixels for each calculated 
map. We use these scatter plots to look for approximate 
relations between the lens properties. 

3.3.3 7 kk 1/2 

The heuristic approach should preferrably lead to simple 
approximations which can be applied to observed data in 
an unambiguous way. There will be two caustic lines, which 
means that two parities have to be set, so the approximation 
should preferrably have only a small range of fj, for which 
parities need to be set. 

Studying Figs. 3, one gets the impression that, on av- 
erage, 7 is proportional to k l I 2 . This assumption leads to 
a well-behaved relation between and k, which is also in- 
vertible. In general, if we assume 

2) K ] 1/2 , 



7= [(c + c 
with < c < 1 (by convention), then 

/! 1 = (K — c)(k — C ) . 



(20) 



(21) 



This implies that there are two k's for each /1, but as we 
measure N/Nq, we can only obtain which can correspond 



to four different values of k. 

The estimator for k, given \/j,\, is then 



Kc — 



C + C 



■ nui- 



i/2 



(22) 



where V is the lens parity, i.e. the sign of jx, while the new 
parity S indicates which side of the minimum we are: it is 
the sign of K m in — k, and switches sign around p = /n m i n . 
These two minima are 



= (c + c 1 )/2 , fi n 



c + c 



-r- 



1 



(23) 



Also, one 



Note that /i m ; n is a local maximum for |/u m i n | 
recovers the 7 = approximation when c = 1. 

The two critical lines are at k = c and k = c _1 , as is 
obvious from eq. (21) . We compare these critical lines to the 
true critical lines in Fig. (4), for a fairly massive cluster with 
central k larger than one. For comparison, the critical lines 
corresponding to the 7 = and 7 = k estimators are shown 
as well, in dark grey and light grey colours respectively. 

Combining Eqs. (9), (10) and (20), we see that this 
approximation corresponds to assuming 



A10A 



12^21 



2c 



^(2-An-A 22 ) -\{A 2 i-A xl f .(24) 



In order to solve this equation, we need to make a further 
assumption for An, A22, and A12 (which is equal to A21). 

We can assume spherical symmetry, i.e. k = k(x), but 
unfortunately, as shown in Appendix A, there exists no 
spherical solution for which 7 = [(c+c -1 — 2)k] 1 / 2 . However, 
the Plummer potential, which can be written as 



(25) 



where 0o = M6 2 /2n R 2 T, C v, does show this behaviour for 
6 > 6 C , where 6 is the angular distance from the centre 
of the cluster, 9 C the angular core radius, E cr the critical 
surface mass density as defined in eq. (5), and R c = 6> c -Dl its 
corresponding physical core radius, where Dl is the angular 
diameter distance from the lens to the observer. For this 
potential 



= 20o 



(6> 2 



g2\2 



and 7(6!) = 20 o 



(0 2 + e%y 



(26) 



(Kochanek and Blandford 1991), which gives the following 
relation between k and 7: 



1/2 



1/2/ 1/2 l/Zs / ~x 
7 = KQK ' — K = K 1 (K(j 1 — K 1 ) , (27) 

where = k(0) = 20o/#c- For small k < we then have 
7 w (kqk) 1 ^ 2 , so we can identify kq = (c + c _1 — 2) 1 / 2 . 
Clearly, we need to supplement this potential with extra 
depth in the core region, as kq for the Plummer model will 
not be very large for typical values of c. 

One might be able to get 7 oc k 1//2 by constructing 
more complicated potentials, involving an elliptical compo- 
nent with ellipticity growing as a function of radius (con- 
stant ellipticity does not work), or a quadrupole component 
of some sort. However, we can simply treat this approxima- 
tion as a heuristic one, motivated by simplicity and inverta- 
bility. 

In order to obtain the convergence and shear from the 
numerical models, which are discrete in nature, we had to 
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apply smoothing. Even though the adaptive smoothing al- 
lows relatively high resolution to be retained in the core 
of the cluster, one might worry that the central values of 
both k and 7 are artificially reduced. Because the shear is a 
global function of the convergence through the convolution 
of eq. (2), this would be most severe for k. We therefore 
tried several basic smoothing lengths, from 0.1/i _1 Mpc to 
0.5ft _1 Mpc. The ones smaller than the value of 0.25/i ^^Mpc 
that we actually use gave the same results, i.e. similar scat- 
ter plots and the same value for c from the fit. The only 
difference is in the very centre of the cluster, where k is 
slightly more peaked so that 7 has somewhat larger max- 
imum values. However, the discreteness of the numerical 
simulation does becomes quite visual for these small val- 
ues of the basic smoothing radius. Oversmoothing affects 
the results more severely, pushing 7 up in the outskirts and 
down in the centre. So, provided that the basic smoothing 
length is chosen sensibly, it seems that the use of adaptive 
window filtering results in reliable convergence, shear and 
magnification maps. 

3.3.4 7ock 

Another assumption which leads to a simple, invertible re- 
lation which has p(0) = 1, is 7 = an. This leads to 

H^ 1 = (k — l) 2 — (an) 2 , 



and inverts as 
Ka = 1 - T 



(1 - a)V\n\ + a 2 



1/2 



(28) 
(29) 



1 /9 

where T is a parity similar to the parity S of the 7 oc k ' 
approximation. This approximation corresponds to pivoting 
the 7 = approximation around (0, 0, 1) in Fig. (11). It can 
serve as an approximation intermediate to the 7 = and 
7 = k approximations, but it has the disadvantage of the 
extra parity. More importantly, this approximation is a bad 
fit to the numerical simulations, so we will refrain from using 
it. 

3.4 Iterative estimate 

For each of the convergence estimators discussed above, we 
can find an iterative solution by calculating the shear corre- 
sponding to the estimate for k using the thin lens approxi- 
mation described in Section 2.1. In other words, we use the 
estimate for k to calculate the lens deflection a using eq. 
(2), and then get 7 from a. This shear is then used to find 
a new estimate for k by simply applying eq. (11). Because 
of the convolution, this estimate is non-local. 

A problem with this estimate is that the magnifica- 
tion observed is usually smoothed on a scale larger than 
the smallest significant structures in the lens. The shear 
field has a strong quadrupole component, with small scale 
structure which is not present in the measured smoothed 
magnification, although it is there in the unsmoothed mag- 
nification. This means that the new estimate for k derived 
from eq. (11) will have this quadrupole structure imprinted 
by the shear estimate. In the next step the second estimate 
for the shear will have even more structure on small scales, 
generated by the small scale structure which is present in 
the second estimate for k. So convergence is not achieved. 



3.5 Comparison 

In principle, the best strategy, if data quality allows it, is 
to use the iterative method, starting from one of the esti- 
mates discussed above. If parities can be assigned reliably, 
k c seems a good choice, otherwise one should start from n\ , 
when one has only approximate knowledge about one criti- 
cal line. However, because of the problems associated with 
the iterative estimate due to the likely smoothness of the 
observed lens magnification, in practice the k c estimator is 
to be preferred. 

We show the caustic lines corresponding to the one- 
caustic and k c estimators for two of the lens models in 
Fig. (4), along with the real caustic lines for those mod- 
els. Clearly the k c estimators has its caustic lines closest to 
the real ones. The one-caustic estimators are mostly use- 
ful for reference, as possible limits, of for observational data 
of poor quality. A comparison of caustics is not enough to 
compare the usefulness or goodness of the various approx- 
imations. The k c and Ki estimators, for example, cross at 
k = c + c _1 — 2, so a total mass estimation where the to- 
tal mass is calculated beyond this crossover point might be 
similar for both approximations. We therefore test all esti- 
mators in the next Section, both on perfect magnification 
data as well as on mock galaxy counts. 



4 TESTING THE CONVERGENCE 
ESTIMATORS ON CLUSTER MODELS 

We test the convergence estimators on a sample of numer- 
ical cluster models in order to examine how well each of 
them performs in different circumstances. The models al- 
low us to test how well one can reconstruct the convergence 
from the magnification alone, by simply comparing the re- 
sult of applying the estimator to the true convergence. With 
the addition of simple background galaxy distributions we 
also produce mock lensed galaxy distributions which we can 
'observe', thus providing a direct test of the magnification 
method. Such tests should reveal intrinsic uncertainties and 
systematic offsets when this method is applied to observa- 
tional data. 

We perform two types of tests on two types of data. We 
test on clean simulated magnification maps with full knowl- 
edge of the image parity, and for a single source redshift, and 
we test on mock lensed galaxy distributions, performing the 
whole route from galaxy counts to mass estimation. 

4.1 Estimated versus true convergence for 
known magnification 

At this point we forget about all possible observational prob- 
lems. In other words, we assume that we have a perfect 
magnification map of the lensing cluster (including the par- 
ity), and investigate how well the approximations allow us 
to reconstruct the convergence from just this magnification 
map. This should show us the best possible result each ap- 
proximation can provide us, and reveals systematic effects 
associated with the estimators alone (as compared to other 
possible, mainly observational, sources of error). 

In order to quantify these statements, we look at statis- 
tics which are based on a pixel-to-pixel comparison of es- 
timated convergence maps to true convergence maps for a 
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Figure 5. Comparison of estimated k versus true k. The middle column shows estimated k maps for the 7 = 0, 7 = K, 7 oc k 1 / 2 , and 
iterative (1 step) estimators. The right column shows the fractional difference (as percentages) of true versus estimated k maps. The 
top left panel shows the true K, the panel below that the linear estimate for ft, i.e. (fi-l)/2. The iteratively estimated shear is shown in 
the bottom left corner, below the true shear. The model cluster used is the fourth entry of Table 1. 
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sample of rich cluster models. We are interested in the aver- 
age performance for a typical rich cluster. For this purpose 
we selected 29 rich cluster models (see Section 3.3) from 
the catalogue of van Kampen & Katgert (1997). We then 
calculated the average of various statistics applied to each 
of these models. The results are listed in Table 2, and are 
described below. 

For each model we calculated the minimum and max- 
imum difference between the estimated and the true pix- 
els/bins, denoted in Table 2 by 'min' and 'max' respectively, 
and the average over all pixels of this difference, denoted by 
'mean'. These statistics describe local and mean deviations 
from the true convergence. Two more statistics describe the 
scatter in the residual of true minus estimated convergence: 
'abs' is just the average absolute deviation, while 'rms' is 
the standard r.m.s. deviation. These are good measures for 
the accuracy of the estimators. All statistics were calculated 
for absolute differences in k, and for fractional differences, 
expressed in percentages. 

An additional estimator has been added to Table 2, la- 
beled 'mean', which just comprises taking the mean of the 
7 = and 7 = n estimators. This estimator has no physical 
basis, and even produces imaginary shear for some values of 
k, but might be useful in practice as the 7 = estimator 
tends to overestimate and the 7 = k tends to underestimate. 

4.1.1 Maps 

Using the thin lens approximation, as summarized in Section 
2.1, we produce maps of the lens convergence, shear and 
magnification for the selected four cluster models. We then 
use the magnification only (with full knowledge of its parity, 
though), to reconstruct convergence maps using the various 
assumptions about the shear, and compare these to the true 
convergence. This is shown in Fig. 5 for the most massive 
model cluster, as this one has the largest range of possible k 
values and therefore provides the most stringent tests for the 
estimators. Note that this clusters has a value of as which is 
far larger than allowed for the standard CDM scenario (e.g. 
van Kampen & Katgert 1997 and references theirin). 

In Fig. 5 we see that linear estimator performs very 
poorly, as it just follows the magnification, including the 
caustics. It is only doing well for small k, as expected. The 
7 = assumption produces an overestimate for the con- 
vergence for all regions of the cluster. The k — 7 estimator 
underestimates the mass in the central regions of the cluster, 
and (slightly) overestimates for k < 0.2. The 7 oc k 1 / 2 esti- 
mator clearly performs best, even better than the iterative 
estimator. 

As the cluster shown in Fig. 5 is fairly extreme, we 
should consider the performance statistics listed in Table 2 
that were obtained for the rich cluster sample, which con- 
tains less evolved clusters with smaller overall convergences. 
Much of what was seen in Fig. 5 is now expressed quan- 
titatively. The linear estimator diverges, while the 7 = 
and 7 = k estimates generally over- and underestimate, re- 
spectively. Note that the fractional and absolute statistics 
weight pixels differently, the former giving more weight to 
the more noisy pixels outside of the cluster, and the latter 
to the central regions. This results in a mean overestimate 
in the fractional statistics of both the 7 = and 7 = k 
estimators. The 7 oc k 1 / 2 and iterative estimators perform 



best, although the improvement over the other estimators is 
certainly not dramatic. 

4-1.2 Radial profiles 

Besides testing the estimators on the full 2D map of the 
lens magnification, which is hard to obtain in practice, we 
test them on azimuthally averaged 'magnification profiles' 
as well. These have already been obtained observationally 
(Fort et al. 1997; Taylor et al. 1998), and are therefore of 
particular interest. 

Again we first look at radial binning of the magnifica- 
tion map directly obtained from a numerical simulation, i.e. 
we only investigate the effect of the annular binning proce- 
dure. We performed the same statistics on the same model 
cluster sample as above, but now on the radially binned 
data. One expects that the binning reduces the scatter in 
the local relations, but at the same time binning means loss 
of information, especially for clumpy aspherical clusters. 

From the performance statistics listed in the lower half 
of Table 2 we can see that the perfomance for profiles is 
worse than for the maps. This stems from the fact that a lot 
more weight is put on the central region of the clusters due 
to the annular binning, where the uncertainties are largest. 
The 7 = and 7 oc k 1 / 2 approximations clearly perform 
best in terms of accuracy, as expressed by the r.m.s. statistic, 
whereas the mean of the 7 = and 7 = n estimator has the 
smallest systematic error. 

4.2 Estimated versus true convergence for 
simulated observational data 

4-2.1 Simulated background galaxy distributions 

The tests we performed so far were for the best possible data 
set, a continuous magnification map or annular profile, with 
known sign, and one source redshift. Obviously, this is not 
going to be the case for observational data, which has many 
intrinsic uncertainties as described in Section 2.3. 

The main intrinsic source of error is the clustering of 
background galaxies, the variation in their redshifts, and 
the fact that we usually have a limited number of galaxies to 
count, which introduces shot noise. In order to examine the 
effect of these uncertainties on the convergence estimate, we 
need to construct magnification maps derived from number 
counts of a clustered set of background galaxies which are 
at different redshifts. 

We use background galaxy distributions which were 
generated as in BTP, who adopt the lognormal-Poisson 
model (Coles and Jones 1991) as a simple but sufficient 
model for the clustering of background galaxies. These 
galaxies were generated in planes of constant redshift, with 
luminosities drawn from a Schechter luminosity function 

<t>{L) = 0»(z) exp(-L/L»), (30) 

where 4>*{z) = 0.02h 2 (l + z) 2 Mpc~ 3 , and L* is taken con- 
stant. The redshift distribution for these galaxies can be 
well approximated by n(z) — Az* z e ' * , with z„ = 0.8. 
Like BTP, we consider R band counts. In order to be able to 
count lensed (i.e. amplified !) distributions down to a limit- 
ing magnitude mi; m , we generate background distributions 
down to at least rri\ im + 1. 
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Table 2. Performance statistics of the convergence estimators, 
based on a pixel-by-pixcl comparison of the estimated versus true 
convergence maps (top half), and a bin-to-bin comparison for the 
estimated versus true profiles. A positive sign corresponds to 
overestimating the convergence. Please refer to the main text for 
a description of the statistics. 
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Statistic 


linear 


7 = 


7 = K 


mean 


7 OC K 2 


Itcrat. 




Absolute statistics for 2D 


maps 




min 


-0.01 


-0.09 


-1.19 


-0.57 


-0.43 


-0.12 


max 


7.27 


0.23 


0.06 


0.10 


0.06 


0.11 


mean 


0.11 


0.01 


-0.02 


-0.01 


-0.02 


-0.01 


abs 


0.11 


0.01 


0.03 


0.02 


0.03 


0.01 


rms 


0.46 


0.03 


0.09 


0.04 


0.04 


0.01 




Fractional statistics for 2D 


maps 




min 


-6.04 


-6.09 


-6.50 


-6.13 


-6.93 


-8.32 


max 


21.53 


1.97 


1.37 


1.64 


0.75 


0.70 


mean 


0.68 


-0.49 


-0.83 


-0.66 


-1.58 


-1.44 


abs 


1.87 


0.97 


1.04 


0.96 


1.61 


1.47 


rms 


2.72 


1.24 


1.17 


1.18 


1.18 


1.52 




Absolute statistics for 


radial profiles 




min 


0.05 


-0.20 


-6.44 


-2.74 


-1.78 




max 


864.70 


1.78 


0.11 


0.26 


0.12 




mean 


20.02 


0.33 


-0.72 


-0.19 


-0.32 




abs 


20.02 


0.35 


0.78 


0.34 


0.34 




rms 


86.38 


0.44 


1.65 


0.69 


0.44 






Fractional statistics for radial profiles 




min 


2.40 


0.06 


-11.35 


-4.79 


-4.97 




max 


1121.60 


4.60 


1.80 


2.74 


-0.92 




mean 


37.61 


2.26 


-1.39 


0.44 


-2.90 




abs 


37.61 


2.35 


2.68 


1.79 


3.01 




rms 


116.44 


1.23 


3.73 


2.02 


1.11 





The field count slopes of these distributions for two 
magnitude intervals and two lens redshifts are obtained from 
fits to the luminosity functions of the 32 samples generated. 
This is shown in Fig. 7, with the values of the slopes an- 
notated. These luminosity functions are good fits to the 
observed ones (e.g. Metcalfe et al. 1995), except possibly 
for the faint end, where the slope should remain roughly 
constant instead of flattening down, i.e. where the simple 
assumptions of BTP are likely to break down. However, the 
observations at the faint end are still controdictionary. Fur- 
thermore, a flattening slope actually provides a useful test 
for our methods, as other bands like the U-band will show 
this behaviour. 

With these background distributions it is straightfor- 
ward to produce mock observations. All galaxies (i.e. down 
to R = 25) are mapped to the image plane using the lens 
mapping, with the deflection angle calculated using the thin 
lens approximation, as described earlier. We then select 
galaxies for counting from a range in magnitude. 

Before we can obtain a magnification map from these 
number counts, we need to either smooth the number count 
distribution, or azimuthically bin the counts to obtain count 
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Figure 6. Example of a number count simulation. The top panel 
shows a generated background galaxy distribution using the sim- 
ple model of BTP. The bottom panel shows the same distribu- 
tion lensed, with the true convergence contours superimposed (for 
z s = 0.8). Galaxies are generated up to R = 25, but only plot- 
ted in the range 22 < R < 24. This allows for a factor of 2.51 
magnification, which is sufficient in most cases. 

profiles. For a limited number of galaxies this is the only 
feasible alternative. We consider both alternatives, and test 
for the intervals 22 < R < 24. An example of the former is 
given in Fig. 6, which shows a background distribution cut 
to 22 < R < 24 and its corresponding lensed distribution, 
also cut to 22 < R < 24 but obtained using the full R < 25 
background population. Close to the caustics the amplifi- 
cation will be more than one magnitude (i.e. /i > 2.51), so 
we are likely to miss a few galaxies there, which will slightly 
enhance the presence of the caustics in our simulations. 

The mock observations do not simulate colour cutting, 
and problems associated with masking the cluster galaxies. 
The cluster galaxies are modelled (see van Kampen & Kat- 
gert 1997 for details), and take part in the lensing, but are 
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Figure 7. Cumulative luminosity function for all 32 backgrounds 
stacked together, with fits for the average slope over a range in R. 
The top curve is for galaxies beyond z = 0.4, the bottom one for 
z > 0.2. The normalisation is arbitrary, and chosen for clarity. 

not put in the observed image and then masked out. These 
observational difficulties are hard to model, and beyond the 
scope of the present paper. 

4-2.2 Setting parity and the estimator parameters 

Setting the parity is the biggest problem for all estimators, 
and will be the most significant uncertainty in the central 
regions of the observed cluster. However, giant arcs, espe- 
cially with redshifts measured, can be used to guess were the 
caustics are. In the tests we performed above the parity was 
determined from the simulations, i.e. we actually used the 
sign of /x for estimating the convergence. We now disregard 
any such knowledge, and set the parity by hand. 

Concerning the k oc 7 1 / 2 approximation, we can get its 
parameter c from observed data only if we make an assump- 
tion for the behaviour of k between the critical curves. For 
number counts in radial bins we can estimate c if we know 
both the position of the minimum /i m i n , # m ; n , and one of 
the critical lines (or both), #i nn er or Pouter, and assume a 
functional form for Ki £IS ct function of 9 within the range 
dinner i Pouter]- For example, the assumption that between 
the critical lines kocF 1 leads to: 

^ "min ' ^ "min ' \ "inner ' 

If all three characteristic positions are measurable, an aver- 
age of the three c-estimators should provide the best esti- 
mate. We can use this estimate for c to estimate k(8), and 
use that to obtain a new estimate for c. Such an iteration 
should work if we deal with the parities properly. 

4-2.3 Smoothed maps from discrete number counts 

We generate mock number counts by lensing background 
distributions which were produced up to the red magnitude 



R = 25 as described in Section 4.1.1. We then count galaxies 
in the range 22 < R < 24 only. For almost all galaxies the 
magnification is less than one magnitude, so this provides 
a good approximation. We also count galaxies in the same 
field-of-view for the unlensed distribution, in order to obtain 
the field count Nq. The average slope Sr is fitted from a 
reconstructed luminosity function obtained from the same 
unlensed distribution, as this depends on the redshift of the 
lens, zrj. For several zrj's and R ranges we show these fits 
for S R in Fig. 7. 

In the case that we have a sufficient number of back- 
ground galaxies we can try to obtain a magnification map 
from the number counts. We have to deal with shot noise, 
and therefore obtain a smoothed number count map from 
the discrete galaxy distribution. This is not straightfor- 
ward, as we have to deal with caustics, where /i -1 = 0. 
If we smooth the number counts without taking parity into 
account, there will be few points on the map that even ap- 
proach zero after the smoothing operation. We therefore 
apply the following 'trick': we already set parity by set- 
ting N/Nq negative where we believe that fjT 1 is negative. 
This ascertains that the caustics we fix by setting the parity 
remain in place, and gives a much better estimate for the 
magnification near the caustics as well. 

The smoothing scale needed is determined by the av- 
erage surface number density of galaxies. We set it to be 
three times the Poissonian nearest neighbour distance, i.e. 
3(7r < n >) -1 / 2 « 1.7 < n >~ 1 / 2 . This is to make sure the 
smoothing is sufficient for the core region, which is relatively 
devoid of galaxies due to the effect we try to measure. 

An example of estimating the convergence map from 
simulated number counts is shown in Fig. 8. Again the 
fourth cluster from Table 1 was used. As it is put at a 
redshift of 0.4, we used a slope Sr = 0.197 for obtaining the 
magnification map from the number counts. We see that we 
can reasonably well reconstruct the main features of the con- 
vergence map from these counts, but a lot of information is 
lost. The cluster is detected, most precisely by the 7 oc ft 1 / 2 
estimator, but the limitations of the magnification method 
are obvious. Of course one can simply improve signal-to- 
noise by going to fainter magnitudes, which increases the 
number of galaxies and includes galaxies at higher redshift, 
which pushes up the convergence. 

4-2-4 Annular binning of number counts 

Usually the number of background galaxies is not sufficient 
to produce 2D magnification maps, and one is restricted to 
counts in annular bins. We again only select galaxies with 
R magnitudes in the range 22 < R < 24, and count them in 
bins around the centre found from the galaxy distribution 
(the cluster models contain galaxies as well, see van Kam- 
pen & Katgert 1997). Proceeding from number counts to 
magnification and convergence estimates as before, we now 
obtain estimated convergence profiles. We do this for 32 
different backgrounds, in order to get an estimate for the 
intrinsic error on the number counts due to shot noise and 
background clustering. 

This procedure is illustrated in the top panel of Fig. 9, 
where we show the number count profile for one background 
(thick solid line) , with intrinsic error bars, obtained using all 
backgrounds, superimposed. The thin solid line that is also 
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Figure 8. Illustration of the complete route from number counts to estimation of the lens convergence, for a fairly massive cluster (the 
fourth entry from Table 1). The top left shows the number count distribution in the range 22 < R < 24, which is smoothed as described 
in the main text to obtain a map of number count variations, N/Nq. From this map we obtain the magnification map, shown in the 
top right panel. The fourth panel shows the true convergence map, obtained directly from the numerical simulation, which we try to 
estimate. The next four panel show estimates obtained from the magnification map alone, for, respectively, the 7 = approximation, 
for 7 = k, for 7 oc k 1 / 2 , and performing three steps of the iterative scheme. The bottom right panel shows the shear correspinding to 
this iterative estimate. 



plotted shows the average over 32 different number count 
profiles obtained by just changing the background popula- 
tion. The middle panel of Fig. 9 shows the reconstruction 
of the convergence profile for the single background, along 
with the convergence profile obtained directly from the sim- 
ulation. The bottom panel shows the same, but now for 
the averaged profile. The reconstruction for the latter is ob- 
viously better. But despite the noise, one can clearly get a 
fair estimate for the convergence profile from number counts 
alone. 



Finally, we examine how well we can estimate the total 
mass within a certain annulus from these count simulations. 
We simply convert convergence to surface mass density us- 
ing the mean redshift of our background galaxies, which is 
around 0.8 for the simulations shown, and integrate that to 
obtain a projected mass within 0.5ft _1 Mpc. We then com- 
pare this mass estimate for the cluster to the mass obtained 
directly from the numerical cluster model. 

Fig. 10 shows the results of this exercise. The top panel 
shows the distribution over estimated masses for the sam- 
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Figure 9. Simulated number count profiles for the same cluster 
used for Fig. 8, and also for 22 < R < 24. The top panel shows a 
number count profile for a single profile (thick solid line), along 
with the average of number count profiles for 32 different profiles 
(thin solid line). The latter provide an estimate for the intrinsic 
uncertainty due to shot noise and clustering of the background 
galaxies, and these are overplotted as error bars. The second 
panel shows the convergence estimates from the single background 
number count profile shown in the top panel, for 7 = (dotted 
line), 7 = k (dashed line), and 7 oc k 1 / 2 (dot-dash line). The 
true convergence profile, for z s = 0.8, is plotted as a solid line. 
The bottom panel shows the same, but for the average over 32 
different background populations. A intrinsic count slope of 0.198 
is used. 
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Figure 10. Histograms of estimated total projected cluster 
masses within 0.5/i — 1 Mpc, obtained from number count profiles. 
Dotted lines deonte the linear (weak lensing) approximation, solid 
lines are for 7 = 0, dashed for 7 = k, and dot-dashed for 7 oc k 1 / 2 . 
The top panel shows the distribution for 29 rich clusters put at 
zrj = 0.4 in front of the same background galaxy population. 
The vertical lines indicate the average over the estimated pro- 
jected masses of these clusters, while the arrow show the average 
over their true projected masses. The second panel shows the 
same, but now for one cluster (entry three from Table 1) in front 
of 32 different backgrounds. The lines and arrow are now the esti- 
mated and true masses of that cluster. The third panel shows the 
same cluster for larger erg, i.e. more massive. The fourth panel is 
the same as the third, but with the linear estimator included. 
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pie of rich clusters that we considered before. The arrow 
indicates the mean true mass for this sample, whereas the 
vertical lines give the mean for the estimated masses. The 
linear mass estimate is a factor of two too large, while all 
other estimators give similar values, with the 7 = esti- 
mator performing marginally better. Just one background 
was used, but we can also consider the effect of changing the 
background, i.e. cosmic variance, by putting a single clus- 
ter in front of 32 different backgrounds. This is shown in 
the second panel of Fig. 10, for the third cluster from Ta- 
ble 1. As this cluster is one of the most massive from the 
sample, it is no surprise to see that the linear estimator is 
now on average a factor of four too large, with a huge scat- 
ter. The other estimators have a significant scatter as well, 
quantifying what was already seen in Fig. 8. This scatter 
decreases for larger convergences, when looking at the same 
cluster for larger as (shown in the third panel). The cluster 
is then more evolved and therefore more massive. The last 
panel is the same as the third, but with the linear estimator 
included, which is now quite far off. We see that for the 
more massive clusters the 7 oc k 1 / 2 estimator works best on 
average, but that the scatter among the estimators is quite 
similar. 

5 OBSERVED CLUSTERS 

We apply the non-linear estimators for the lens convergence 
to published observational data, in order to test whether rea- 
sonable mass estimates for real clusters can be made, and 
how these compare to masses determined using other, inde- 
pendent techniques. We also like to establish what the errors 
are due to the uncertainty in the assumptions of the non- 
linear estimators when applied to real data, especially for 
the heuristic estimator. After all, so far we only tested the 
heuristic estimator on cluster models which were also used 
to find that estimator. Presently, data of sufficient quality 
exists for two clusters only, CL 0024+1654 and A1689. 

5.1 CL 0024+1654 

Fort, Mellier & Dantel-Fort (1997) have published number 
counts in radial bins for this z = 0.39 cluster, in both the 
/-band and the B-band. They present counts as surface 
density profiles, and also provide the field number counts 
(which we denote by iVo). For the /-band they only count 
galaxies in the magnitude range 25 < / < 26.5, for the B- 
band in the range 26 < B < 28. The main reason for this is 
completeness, but also the fact that they find a field luminos- 
ity function with a fairly constant slope for these intervals: 
S/ = 0.25 ± 0.03 and S B = 0.17 ± 0.02 respectively. These 
values correspond to /3j = —2.67 and /3g = —1.74. Further- 
more, a giant arc is observed at 37 arcsec, but its redshift is 
unfortunately unknown. However, Fort et al. (ibid.) argue 
that it should be close to the mean for the background pop- 
ulation, so we can use its position to set the parity for our 
estimators. 

The relative number counts N/Nq in I and B are plot- 
ted in Fig. 11 (left panels), along with the various estimators 
for the convergence n, (middle panels). Parity was assigned 
according to the position of the arc, indicated by a short ver- 
tical line in the figure. The number counts were smoothed 
with the parities applied, as described in Section 4.2. This 



smoothing is really necessary, as the galaxy number density 
is fairly small: 6.13 arcmin -2 for the B-band, 3.32 arcmin -2 
for the J-band, which is even smaller than for the simulation 
example shown in Fig. 6. The number of bins used by Fort et 
al. (ibid.) is too large, given these densities. The smoothing 
length that corresponds to these number densities is about 
40 arcsec and one arcmin respectively if one would try to 
contruct a number count map (see Section 4.2.3). For pro- 
files, the smoothing length necessary is a decreasing func- 
tion of radius. However, the number density of galaxies is 
increasing with radius (due to the magnification effect), ft 
is therefore reasonable to use a constant smoothing length 
for the profiles. We have adopted 20 arcsec for the .B-band, 
and 30 arcsec for the /-band. 

1 /2 

The B-band data pose a problem for the 7 oc k ' es- 
timator, as the counts rise all the way back to No at about 
10 arcsec. We therefore switch to the 'mean' estimator, de- 
scribed in Section 4.1, at the position where the 7 oc k 1 ^ 2 
estimator becomes undefined. This switch is indicated by 
a diamond symbol in the bottom central panel. This pro- 
cedure is unsatisfactory, of course, but it seems the best 
possible alternative for this particular dataset, which is not 
of high-quality anyway. 

From the convergence profiles we calculate projected 
mass profiles, assuming that all background galaxies are at 
Zs = 1. These are plotted in the right hand side panels 
of Fig. 11, where the the angular scale is transformed to a 
physical scale, assuming that Qq = 1. We find that the 
masses estimated from the B- and /-bands are roughly 
consistent with each other: for both bands we obtain a 
total projected mass within 0.3/i -1 Mpc of approximately 
0.7-0.8 x W 15 h~ 1 M using the heuristic estimator. If we 
treat the 7 = and 7 = k estimators as upper and lower 
limits respectively, which in general is not correct, we have 
an uncertainty of 0.2 x 10 15 /i -1 M© due to the uncertainty 
in the choice of estimator. 

Kassiola, Kovner & Fort (1992) published a mass model 
for this cluster that was fitted to various lensing features. 
They quote a total mass of about 1O 14 /i -1 M0 within 
0.1/i -1 Mpc. Our isothermal estimator gives roughly the 
same mass within that radius, in both bands. It also gives 
a mass of 1.6 x 10 15 h -1 MQ within 0.5ft -1 Mpc, which we 
should consider a lower limit as the isothermal estimator 
typically underestimates masses. 

Bonnet, Mellier & Fort (1994) have measured the tan- 
gential shear profile for this cluster, and estimated the mass 
within 1.5fr -1 Mpc to be 1 - 2 x 10 15 fr -1 M Q , depending on 
the assumption for the density profile. This range contains 
the mass we find within a radius of 0.5/i -1 Mpc, so in order 
to be consistent, we need to assume that the mass within 
1.5/i -1 Mpc is close to the upper limit given by Bonnet et al. 
(ibid.), or that most of the cluster mass resides in the inner 
0.5/i -1 Mpc. 

However, looking at the counts (left hand panels of Fig. 
11), we see that the counts never reach the No given by Fort 
et al. (1997). This is not too important for the central re- 
gions of the cluster, where the counts are low anyway, but in 
the outskirts it yields a significant contribution to the total 
mass. If we take a value of No to which the counts do con- 
verge, which is 20 per cent smaller than the one taken from 
Fort et al. (ibid.), we find that the mass within 0.5/i -1 Mpc is 
about 1.0 x 10 15 fr -1 M Q , while the mass within 0.1/i -1 Mpc 



16 E. van Kampen 



1.0 

0.8 

0.6 

0.4 

0.2 
0.0 




CL0024, I 



0.0 0.5 1.0 1.5 2.0 2.5 3.0 
G [ arcmin ] 




0.5 1.0 1.5 
[ arcmin ] 



in 
LU 




lu 0.5 - 



0.0 0.5 1.0 1.5 2.0 2.5 3.0 
6 [ arcmin 1 



2.0 



2- 

* 1.5 



CD 

co 1.0 



0.5 



0.0 









: \ 


\ \ 




■ \ 






\ 













0.00 0.10 0.20 0.30 0.40 0.50 0.60 
R [h" 1 Mpc] 



1.0 
0.8 ; 
0.6 ; 
0.4 : 



DC 

v 



■■ / ' 
/ / 



0.2 
0.0 



0.0 0.5 1.0 1.5 
6 [ arcmin ] 



0.00 0.10 0.20 0.30 0.40 
R [h" 1 Mpc] 



Figure 11. Convergence estimates for the clusters CL0024, in the / and B band. The left hand panels show the number count profile, 
where the dots represent the binned number counts published by Fort et al. (1996) and the solid lines the smoothed number counts 
with parity taken into account (see text for details). The middle panels show the convergence estimates for the various assumptions 
made: weak lensing (solid line), 7 = (dotted line), 7 = re (dashed line), 7 oc re 1 / 2 (dot-dash line). The same linetypes have been used 
to plot the corresponding cumulative mass profiles in the right hand panel. The position of a giant arc is indicated by a vertical line. 
The diamond symbol indicates a switch in estimators, explained in more detail in the main text. 



does not change much. This seems a more consistent mass 
estimate for CL 0024+1654 from the lens magnification. It 
is therefore essential to get a better observational determi- 
nation for the field count Nq. 

Fort et al. (ibid.) also published preliminary data on 
A370, but we believe these not to be of sufficient quality 
to attempt mass estimation, especially since the position of 
the minimum in the number counts is far removed from the 
position of a giant arc observed in this cluster (Soucail et al. 
1988). 

5.2 A1689 

We summarize the results for A1689, as published by Taylor 
et al. (1998). The total mass within 0.24/i _1 Mpc was found 
to be 0.5 ±0.09 x 10 15 /i _1 M Q , in fair agreement with X-ray, 
virial, and weak shear mass estimates. The number of back- 
ground galaxies used was roughly similar to that used in the 
count simulations presented in Section 4.2, and the uncer- 
tainties are therefore quite similar, i.e. on the order of 50 per 
cent. A1689 has a giant arc, but just like for CL 0024+1654, 
the redshift of this arc is unknown, which leaves an ambigu- 
ity in the interpretation. Note that the total exposure time 
used to take the A1689 data was quite short (less than 3 



hours in the V and I bands), so an improvement on this is 
easily achieved. 

6 CONCLUSIONS AND DISCUSSION 

In order to be able to estimate the lens convergence re from 
the lens magnification /i, we searched for a realistic approxi- 
mation between either of these quantities and the lens shear 
7, which needs to be taken into account in the strong lensing 
regime. We looked at simple analytical lens models with ex- 
act relations between the three lens quantities, notably the 

7 = approximation, corresponding to a uniform sheet of 
matter, and the isothermal model, which has k = 7. 

As these do not provide adequate descriptions for nei- 
ther observed nor simulated clusters, we have studied the 
lensing properties of a complete catalogue of galaxy cluster 
models to find an approximation for the shear which still al- 
lows a simple inversion from magnification to convergence. 
A approximation that does just that is 7 oc re 1 / 2 , which leads 
to a simple two-caustic magnification-convergence relation. 
A disadvantage of this approximation is that two parities 
have to be set, which can be a problem for observational 
data, as the sign of the magnification cannot be directly 
measured. 
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We also discussed the iterative technique, where one 
starts with any of the estimators mentioned to obtain an es- 
timate for the shear using the thin lens approximation, and 
then uses that shear to calculate a new convergence esti- 
mate. However, this method is not likely to converge for ob- 
servational data because the dominant quadrupole structure 
of the shear field will not be present in the observed mag- 
nification field, which usually needs to be smoothed fairly 
heavily. The iterative estimate is still useful, however, if one 
performs a few steps only, or preferrably just one. 

Two types of tests were performed on all estimators 
considered: the ideal case for which full knowledge of the 
magnification distribution exists (including its sign), and the 
case where we start from mock lensed background galaxy 
counts, mimicking intrinsic problems like the clustering of 
these galaxies and shot noise due their limited number. 

The first type of tests have shown that if the magni- 
fication is perfectly known, the mass distribution of these 
cluster models can be fairly well reconstructed using the 
estimator based on a the relation and convergence of the 
form 7 oc y/R, or the iterative estimator operated for one 
step only. The isothermal estimate generally underestimates 
the surface mass density for k > 0.3, and overestimates for 
smaller it. The 7 = estimator overestimates for all k up 
to the first caustic, and can therefore be considered a strict 
lower limit for these n. 

The second type of tests aimed at mimicking observa- 
tional data, i.e. going from number counts to convergence 
estimates. We showed that it is still possible to reconstruct 
the convergence for simulated clusters, although the intrin- 
sic uncertainties become significant. We have also demon- 
strated that the mass estimated using the weak lensing ap- 
proximation is at least twice the real mass, thus showing 
that one really needs to go beyond the weak lensing approx- 
imation to get sensible cluster mass estimates. 

We used published number counts for the cluster CL 
0024+1654 (Fort et al. 1997) to illustrate the non-linear 
mass estimation technique. We showed that the total mass 
estimated compares fairly well to estimates from other tech- 
niques, even though the quality of the data is relatively poor. 
The uncertainties in the mass found for both this cluster and 
for A1689 (Taylor et al. 1997) are quite large still, but there 
are many ways to reduce these. We can get (photometric) 
redshifts for the background galaxies, find a better value for 
the field number count, obtain the redshifts of the arcs, and 
use wavelengths for which the luminosity function is either 
much steeper of much shallower that the slope of 0.4 for 
which there is no variation in number counts at all. Also, 
we should go to fainter magnitudes, in order to get higher 
background galaxy number densities, which minimizes the 
clustering and shot noise problems. Furthermore, the lens 
becomes stronger as the convergence gets larger due to the 
higher mean background galaxy redshift. 

In concluding, we showed that mass estimation from 
lens magnification in the strong lensing regime is possi- 
ble. We seem to find consistent mass estimates for both 
CL 0024+1654 and A1689 with only limited observational 
data available. This should give us encouragement to obtain 
better data for these and other clusters. Obviously, in com- 
bination with other mass estimates the lens magnification 
technique will be even more promising. 
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APPENDIX A: PROOF THAT SPHERICAL 
LENSING POTENTIALS CANNOT HAVE 

7 oc re 1 / 2 

For spherically symmetric lensing potentials, both the lens 
convergence k{x) and magnification n(x) are functions of the 
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deflection angle a(x) and its first derivative (eg. Schneider 
et al. 1992): 

' a(x) da(x) 

> = 2\ 

and 



. . 1 /MX) da(x)\ 



^ X (A2) 
1 a(x)\/ daix)' 



s c cx I \ da; 
So if we want to have = (cT x — k)(c — «) (e.g. eq. 21), 



we need to have the following two equalities: 

a(x) =c da(x) = l/a(x) | da(x) \ 
cx dx 2 V x dx J 

Eliminating da(a)/da;, we have the following condition for 
the approximation 7 oc k 1 / 2 to hold: 

(c 2 - 2c + l)a(x) = . (A4) 

Besides the solution a(x) — 0, corresponding to a sheet of 
matter, this condition is not met for any real c. 



